perm filename CACHE.7[AM,DBL] blob
sn#427717 filedate 1979-03-25 generic text, type C, neo UTF8
COMMENT ⊗ VALID 00009 PAGES
C REC PAGE DESCRIPTION
C00001 00001
C00002 00002 \input paper.tex
C00005 00003 \NSECP{INTRODUCTION}
C00016 00004 \NSECP{THE ASSUMPTIONS}
C00033 00005 \NSECP{DETAILS OF THE SOLUTION}
C00047 00006 \SSEC{CACHING: INTELLIGENT REDUNDANCY}
C00063 00007 \SSEC{EXPECTATION-SIMPLIFIED PROCESSING: INTELLIGENT FOCUS OF ATTENTION}
C00070 00008 \NSECP{COGNITIVE ECONOMY REVISITED}
C00072 00009 \NNSECP{Acknowledgements}
C00073 ENDMK
C⊗;
\input paper.tex
\TITL{COGNITIVE ECONOMY}
\vfill
\AUTHO{Douglas B. Lenat, \ Stanford University}
\AUTHO{Frederick Hayes-Roth, \ Rand Corporation}
\AUTHO{Phillip Klahr, \ Rand Corporation}
\vfill
\vskip 10pt plus 2pt
\NNSECP{ABSTRACT}
Intelligent systems can explore only tiny subsets of their potential
external and conceptual worlds. They must develop efficient forms
of representation, access, and operation to increase their
capacities. Some of these forms involve abstraction, caching, and
expectation-simplified processing. These capabilities in turn can
combine to provide extremely powerful increases in performance. For
example, all three can combine to simplify simulation or, one of its
related functions, detection of surprising events. In developing
economic principles for modern AI systems, we are forced to favor
some counterintuitive ideas, policies which are not generally
followed because their contribution to overall cognitive utility is
not apparent. For example, the nonredundant storage of properties
in hierarchical
inheritance nets increases many processing costs while providing
minimal storage cost savings.
\vfill
\eject
\NSECP{INTRODUCTION}
When we build an AI program, we often find ourselves caught in the following
tradeoff: On the one hand, we want to develop our initial systems quickly
and painlessly, since they are experimental and ephemeral. On the other hand,
we want our systems to run as efficiently as possible, to minimize the
temporal delays, to reduce the magnitude of the cpu time and space required.
We are torn between
{/it expressability} and {/it efficiency}.
Many AI researchers {/sl cum} language designers have focused on {\it
expressability}, the problem of rapidly constructing a working
experimental vehicle.\foo{Consider the goals of {\:c SAFE} [Balzer 77],
{\:c PSI} [Green 77], {\:c EMYCIN} [ref], {\:c RITA} [or Rosie?], and
{\:c KRL} [Winograd & Bobrow 77].}
They have produced some excellent software such as
Interlisp's {\:c DWIM}, File, and Break Packages.\foo{[Bobrow & Raphael 73] is
still an excellent discussion of the issues involved in such ``AI language"
efforts.}
Three fundamental techniques
utilized in highly {\it expressive} programs are:
({\it i}) reliance upon a very high level language,
({\it ii}) planning at multiple levels of abstraction, and
({\it iii}) inference by pattern-directed knowledge sources.
This paper is attempting to draw attention to the second goal,
{\it efficiency}.
We present techniques which do not sacrifice expressability, yet enable
programs to (semi-)automatically improve themselves, increase their productivity.
The basic source of power is the ability to predict the way that the program will
be used in the future, and to tailor the program to expedite such uses.
The traditional approach to program optimization has assumed that the predictions
are made by the programmer (e.g., by explicitly providing assertions) or can be
inferred statically by
({\it i}) Knuthian and Dijkstraining analysis of the program,
({\it ii}) carefully designing the program's data structures to be appropriate,
and ({\it iii}) compiling (as in the {\:c FORTRAN H} compiler; optimizing
transformations of the program [Burstall & Darlington 73], etc.)
We advocate the use of methods more {\it dynamic} than those.
One valuable source of prediction comes from the program's experiences as it is run.
If a program can ``learn" from its experiences, it might try applying each piece of
newly acquired knowledge to the task of specializing itself, modifying itself
to run more efficiently in the current runtime environment.
We believe that a program's ``intelligence" can be increased in this way;
that is, by increasing its ability to
acquire appropriate knowledge, to organize that knowledge, and to refine
the conditions under which that knowledge is recognized to be applicable.
Some earlier automatic programming systems [Darlington & Burstall, Lenat's PUP6,
Jim Low, Kant & Barstow,
others?] were designed to improve programs' efficiency, and many of their
ideas have found their way into the techniques we now describe.
Those systems
had program construction, transformation, and optimization as their primary
task; we are proposing general methods to provide
self-improving abilities
to other kinds of AI programs
(speech understanders, theory formers, expert simulators, paraphrasers,...)
Three of the abilities which make a program {\it efficient} are:
\noindent $\bullet $
{\bf Dynamic self-monitoring} (the ability to sense, record, and analyze
dynamic usage) {\bf and self-modification} (the ability to use that
knowledge to redesign/recompile itself with more appropriate
representations, algorithms, data structures; i.e., intelligent learning.)
\noindent $\bullet $
{\bf Caching of computed results}
(storing the results of frequently-requested
searches, so they needn't be repeated over and over again; i.e.,
intelligent redundancy.)
\noindent $\bullet $
{\bf Expectation-filtering}
(using predictions to filter away expected, unsurprising data,
thereby freeing up processing time for more difficult subtasks;
i.e., intelligent focus of attention.)
\yskip
{\it Cognitive economy} is the term by which we describe such heightened
productivity.
Computer programs, no less than biological creatures, must perform
in an environment: an externally imposed set of demands, pressures,
opportunities, regularities.\foo{For a similar recognition of the
importance of the ``task
environment'', see [ref to Newell & Simon's HPS].}
Extending this analogy, we find that
cognitive economy is the degree to which a program is adapted to its
environment, the extent to which
it has found its environmental niche.
\hbox par 2.9in{ % this makes a space for the little figure; see below.
Notice that representing a corpus of knowledge as a minimal
(canonical) generalization/specialization hierarchy, with
interpretatively-computed inheritance, is {\it not} in this category: it
favors expressability, compaction of representation, at the expense of
performance. It is true that a horse is a mammal, and a mammal is an
animal, and from those two we could compute that a horse is an animal (see
Fig. n), but it is more cognitively economical to store one redundant arc
than to recompute it frequently. Psychological studies [ref] indicate
just such redundancies being created and strengthened by repetitions.
}
[[in space above, insert DIAGRAM OF HORSE/MAMMAL/ANIMAL,
WITH REDUNDANT ARC AS A DOTTED ARROW]]
[[more about what cog. econ is -- and is not -- should go here,probably.]]
Every scientific theory is constructed in a rich context of surrounding
theories, methods, and standards which determine which experiments are
reasonable ones to perform and which point of view to take when interpreting
the results -- in short, a {/it paradigm}. We feel it useful
to articulate the "hard core" of our paradigm (the assumptions our theory
rests upon) before delving into more detail about cognitive economy.
For that reason, the entire next section is devoted to a presentation of our
assumptions, including:
a model for intelligence
(Section 2.1), a model for how intelligent computer programs may be
organized (Section 2.2), and a model of the characteristics of present
computing engines (Section 2.3).
Section 3 assumes these models, and contains detailed discussions and
examples of the three main components of cognitive economy:
dynamic self-monitoring and self-modification (Section 3.1),
caching of computed results (Section 3.2),
and expectation-filtering (Section 3.3).
One interesting result that emerges is the prediction that as task size
grows large enough, the techniques which currently are used for expressiveness
(e.g., multiple levels of abstraction) will become important for
cognitive economy -- for efficiency -- as well.
\NSECP{THE ASSUMPTIONS}
The foundation upon which our theories are erected consists of several
kinds of assumptions:
(i) We continually face searches in more or less immense spaces;
intelligence is the ability to bring {/it appropriate} knowledge to bear,
to speed up such searching. Increasing intelligence then comprises
increasing knowledge, its organization, and the conditions for its
applicability. How are these new discoveries made? Empirical induction
(generalizing from experimental and other observations), analogy, and the
criticism of existing theory all lead to new conjectures, new
possibilities.
(ii) Intelligent systems can make the applicability of their knowledge
explicit by representing that knowledge as condition/action rules (If {/sl
situation} then {/sl appropriate reaction}). Such pattern-directed
inference systems (PDIS) can benefit from a schema representation (frame,
unit, being, script, etc.), because this adds structure which the rules
can then take advantage of.
(iii) Computers currently present us with limited cycles, cheap storage,
and expensive knowledge acquisition.
While the reader may wish to turn immediately to a discussion of our
detailed cognitive economy ideas (Section 3), it is useful to examine the
above assumptions in more detail first.
\SSEC{Our Model of Intelligence}
Many human cognitive activities, including most of those commonly referred
to as ``requiring intelligence", can be cast as searches, as explorations
through a search space, meandering toward some goal. We call a
problem-solver more intelligent if he can efficiently work toward a
solution even in the face of an immense search space and an ill-defined
goal. Somehow, he is imposing more structure on the problem, and using
that to search more effectively. How might he do this?
According to our model of human intelligence, he accomplishes his task by
bringing knowledge to bear, knowledge not supplied explicitly in the
problem statement. This knowledge can be about problem-solving in general
(e.g., how to recognize and break cultural blocks) or about the problem's
domain specifically (e.g., which groups of atoms can usually be treated as
superatoms). It may have been learned long ago, or generated during the
early phase of work on the problem.
This implies that a problem solver can become more effective by (i)
increasing his knowledge, (ii) better organizing his knowledge, and (iii)
refining his criteria for deciding when various pieces of knowledge are
applicable. In terms of production (If/Then) rules, these correspond to
adding new rules, modifying the data structure in which rules are held,
and modifying the conditions (IF parts) of existing rules.
How is new knowledge discovered? One route is that of {/it abstraction}:
condensing many experiences into a heuristic which, in hindsight, we see
would have greatly aided us in the past, which would have speeded up our
reaching this state in the search. Closely related is {/it
generalization}, often merely expanding the domain of relevance of a
specific bit of knowledge we already possessed. {/it Analogy} is one of
the most useful techniques; one can draw parallels not merely to other
existing facts and heuristics, but also to the {/sl paths} which led to
their discovery (e.g., even if a result in organic chemistry does not map
over into the inorganic world, the experiment which led you to that first
fact may map over into an experiment which {/it will} reveal a useful fact
in inorganic chemistry; even though the analogue of a mathematicl theorem
may be false in some new domain, the analogous proof technique may be
useful). Another path to discovery is {/it specialization} of more general
knowledge. Finally, we must mention the process of {/it hypothesis,
criticism, and improved hypothesis}, which is a common and powerful method
of spiralling in toward precision. In mathematics [Lakatos] and many
scientific disciplines [Musgrave & Lakatos], counterexamples to current
theories arise frequently, and their incorporation often demands a
deepening of the theory, an increase in knowledge.
Experiments such as the AM program [ref] support our assertion that these
methods can adequately guide even open-ended searches for new definitions
and conjectures. But how can an intelligent system be programmed, how can
such systems be organized?
\SSEC{Our Model of Intelligent Program Organization}
The above remarks about intelligent problem solving apply equally to
hardware and wetware alike. Computer programs which are to be intelligent
must ultimately grapple with the tasks of knowledge acquisition,
representation, and refinement. We cannot provide an abolute answer to how
they should be organized, but we can posit some design constraints which
have proven useful so far.
A very general heuristic in AI programming is the following: If your
program is going to modify its own {\bf X}, then make {\bf X} as
separable, clean, and explicit as possible. In our case, {\bf X} can be
instantiated as "knowledge", or as "applicability conditions for each
piece of knowledge". Thus the heuristic advises us to represent our
knowledge in a separate, clean, explicit form, say as knowledge modules
having some fixed structure, and also advises us to keep the applicability
conditions for each knowledge module separate from the rest of the
knowledge it contains.
This naturally leads us to a pattern-directed inference system, in which
knowledge is broken into separate modules, each with an explicit set of
relevancy tests. Such systems arising in Pittsburgh may overemphasize
cleanliness (so-called pure production systems), while those arising in
California may tend to be a bit too laid back (so-called ad hoc expert
systems), but such variations should not obscure their common source of
power. The dominant PDIS architecture has knowledge broken into a set of
condition/action productions, rules of the form IF {/sl triggering
situation} THEN {/sl appropriate reaction}.
Having a clean representation for rules means having a clean, precise,
elegant language in which to express them. By structuring the conceptual
knowledge of the system, by partitioning each module's knowledge into
several categories, a rule can condense an entire cumbersome description
into a single pointer. The popular schematized representations of
knowledge (scripts for episodes, frames for static situations, beings for
specialists, units for everything) enable a rule like "If there are no
known methods specific to finding new examples of prime numbers, Then..."
to have its condition coded as a simple null-test on the To-get subslot of
the Examples slot of the schema for Prime Numbers. By a judicious choice
of slot types and subslot types, the system builder can reduce most
triggering conditions to such quick checks on the state of various
(subslots of) slots of schemata.
Additional knowledge is required to efficiently locate all the rules which
{/it might} have their conditions satisfied in a given situation, and also
to decide which rules to actually fire (obey) among those whose IF parts
have actually triggered (been satisfied).
The location of potentially-relevant rules is facilitated by organizing
the rules into some structure. For example, AM[ref] organized
mathematical concepts into a generalization/specialization hierarchy, and
tied each rule to its most relevant concept. To find rules potentially
relevant to concept C, AM then needed only to look at the rules tacked
onto C, C's generalizations, {\it their} generalizations, and so on. By
inheritability, any rule relevant to judging Objects in general was
(potentially) relevant to judging an Abelian group; any technique for
estimating the value of any function was relevant to estimating the value
of Ackerman's function.
This requires an explicit focusing of attention, a selection of a "current
topic" C from which the above rippling proceeds. We favor the maintenance
of a separate data structure for this purpose, an agenda of topics to
consider, of subtasks to work on. Knowledge may be brought to bear to
select a topic, then the above quest for potentially relevant rules is
carried out, then they are examined to find the truly relevant satisfied
set, and finally some subset of those are fired off.
\SSEC{Our Model of (Present) Computers}
Since we are considering the problem of building computer models of
intelligent behavior, many of our assumptions deal with the
characteristics of present-day computers. They are the symbol manipulators
we use, but were not designed for that general purpose.
[RICK: FILL THIS SECTION OUT;
2 BEAUTIFUL PARAGRAPHS WILL DO IT. BASIC IDEA WAS:
Computers currently present us with limited cycles, cheap storage,
uniprocessing, and expensive knowledge acquisition.]
\NSECP{DETAILS OF THE SOLUTION}
\SSEC{SELF-MONITORING & MODIFICATION: INTELLIGENT LEARNING}
\SSSEC{DYNAMICALLY MONITORING ITS TASK ENVIRONMENT}
\SSSEC{MODIFYING ITS DATA STRUCTURES AND REPRESENTATIONS}
\SSSEC{MODIFYING THE CONTROL STRUCTURES AND ALGORITHMS}
\SSSEC{FINDING THE PROPER LEVEL OF ABSTRACTION}
A train accident occurs, and over two hundred people are injured; the
only large nearby hospital is called, and asked if they can handle the load.
The administrator queries his shiny new simulator, and it begins to chug
away. It looks over the state of each current patient, each physician,
each hospital resource. It begins simulating the arrival of the wounded,
their movement and temporary storage in hallways, etc. After quite a while,
it shows that the hospital will be saturated and unable to care for any more
incoming patients. The maximimum they can accept is 157 passengers.
If "quite a while" is several hours (which it very likely will be), this is
simply inappropriate. There is a pressing need for a rough answer quickly;
a {\it human} hospital administrator would answer right away, or after a
very brief survey of a few parameters, and our belief is that {\it so should
an intelligent program}.
A colonel monitoring our defense systems detects a fleet of enemy aircraft
attacking in an unexpected pattern. Can our bombers and missles meet this
new attack mode? The colonel ill know precisely how much time is available
to answer that question, and it will be a matter of seconds or minutes, not
of hours or weeks. A detailed simulation program will be of no use to him.
Reasoning will go on at a high level of abstraction, and an approximate answer
will be forthcoming (either from a program capable of such inference, or
-- in lieu of that -- in the colonel's head).
[what are some other possible examples to use here? We should pick say two,
and use the same ones throughout the paper, whenever possible.]
In many real-life problem solving situations, the "goal" is not a single,
precise answer. Rather, there is a cost/benefit tradeoff between the
accuracy of the solution and the amount of resources consumed in producing
it. As with any such tradeoff, there is typically a point of diminishing
returns. Computer programs which are designed only to return exact answers
will then be ill-suited to the needs of the various situations in which it
will be called upon. One area in which this is recognized in AI is game-
playing. The concept of truncated search is fundamental in all chess programs
Yet very little of the same philosophy has permeated to other kinds of
AI programs.
[this last para. needs reworking; my analogy to truncated search may confuse
more than illuminate.]
But this is very strange: we earlier said that reasoning at multiple levels of
abstraction was a technique for heightening expressiveness, not efficiency.
Is this such a powerful ability that it helps improve both simultaneously,
or is something else going on? The answer lies, significantly, in the
nature of the task environment. Multiple levels of abstraction cost a certain
amount to implement, both initially (man-hours of programming) and as the
program runs (cpu time spent maintaining the redundancy, choosing which level
to work on next, etc.) For problems of all sizes, this feature aids in the
expression of new problems to the program, since they may be input at whatever
level(s) seem appropriate to the user. As the graph below shows, then,
MLA will always add to expressiveness, and will either detract or add to
efficiency depending upon how small or large the task is.
There is a more general result here: the methods which humans find useful for
easy expression of problems may be useful to programs attacking very large
problems. For instance, consider the decision about whether to do inference
by a system of pattern-invoked experts, or rather to preprocess the triggering
conditions into a big discrimination network. The former approach is more
expressive, the latter more efficient -- at least for small problems. For
very large tasks, the need to remain flexible grows very large. It is then
more economical to retain pattern-directed control (the compilation of the
patterns is so complex the whole procedure would have to be redone
frequently during the course of execution, if we opted for that alternative
instead.)
see ( Desirability of being able to compute rough answers cheaply
above ( Simulation
Conceptual hierarchies
Hier. in general: possible organizing relations
Supervises, commands, controls, calls (as in subroutin
Containment: part-of versus specialization-of versus
application-of versus element-of
Inheritability: the importance of "or any ancestor"
The notion that the hier. is a compact way of storing
a huge set of assertions, is fairly efficient
vis a vis retrieving / testing for them, and
is much much more efficient for updating them.
This discussion should be general to all above
organizing relns, not just genl/spec hiers.
Make sure to distinguish our meaning for abstraction
(redundant, generalized version) from the other
common meaning (inducing a general case from examples)
Heuristic Hierarchies
Each heuristic H has a domain of relevance
More precisely, H's relevance is a function of task posed,
and is high for some (it domain of relevance),
low for many more, and perhaps negative for others.
Heurisitcs can be related to each other by genl/specialization
[example: "If P(x) is often false, weaken P" -->
Also an example involving sim.]
Heuristics can be object-centered (tacked onto relevant
concepts) and can therefore inherit THEIR relationship
[an ex. of that.. This is "Inheritability" in AM
sense.]
Interpretation and planning at levels of abstraction
Eg., rules of bomber simulation at difft levels
(this example will ultimately be used to suggest
caching for simplification)
[this ties in with the last point about heur. hier.]
Part of this para. might be this example:
"In the short run, what would happen if we didn't take any defensive
actions in a full-scale enemy attack situation?" A strategist can
answer this instantly, to the effect that the enemy missles and bombs
would then reach their targets, destruction would be nearly total, our
capacity for retaliation would be severely reduced, and so on. He can
say this without factoring in the individual changes in aileron
attitutde throughout the flight of each bomber.
He has a high-level "script" which specifies defensive action against an
attack, with a small note of explanation (e.g., "else destruction"), and
that's as detailed as he needs to get to answer the question.
Now consider a question like "What if we had a fleet of 200 B1 bombers
in addition to our current forces?" The strategist may again be capable
of giving a quick estimate of the effect, but it's possible that a more
detailed analysis would reveal some cusp, some nonlinearity not apparent
to him through simple extrapolation, through the application of general,
high-level rules of thumb. One possibility is that his old, general,
high-level scenario could be used as a plan, as a set of general expectations
which guide the detailed simulator, which predict which aspects of the
world will benefit from much more detailed simulation (e.g., encounters where
our bombers hitherto failed to reach their targets), and which areas will
probably be unaffected by the change in the world (e.g., attacks on our
coast by enemy submarines).
[note that we can easily extend this discussin a bit for caching: keep
around the old simulation, at various levels, and only re-run those parts
which might be affected by the change in initial situation, and even then
only rerun it at the hihest level possible. This latter decision is probably
always made locally: run until the results stop changing, then you know
you've expanded down one level too far, so you can back up and stop.
Related research
Include mention of psych. studies showing that people may
reason at the word level, rather than more primitive
level, because the high-level word manipulation
process is more efficient than the manipulation of
large, detailed structures (such as huge sem. nets).
\SSEC{CACHING: INTELLIGENT REDUNDANCY}
Modifying memory to save computed results to speed subsequent accesses
Simpleset case: After computing F(x), remember it.
Also important: After computing F(x), remember the PATH
i.e., the way you computed it, the traps, the blind
alleys, the overall type of successful approach, etc.
Analogue in chess: first hinted at by Newell Shaw and Simon
in their early chess paper, then elaborated by Berline
in CAPS: the idea that you should abstract out a
description of a path through the search tree, so
that the remainder of the seaarch could partake of it.
Generalization of hardware concept
What more to say than the name, the hardware defn and uses,
and the obvious analogy to our above proposal.
Perhaps here is a good place for some rough calculations
which show the extra cost and benefits in some sits.
In which case, use the same few situations all through
the paper as running examples.
EURISKO types of caching, as first examples
This means a diversion into representation of concepts as
slots and subslots (facets and subfacets).
Two types of slots: elementary and virtual.
Blend them together: a cached virtual slot appears much
like an elementary primitive slot.
Note this is most applicable when the values are slowly-
changing. This is ideal for the defns. of slots.
This enables the program to occasionally change a slot
's definition and yet (except right after such a
change) run as efficiently as if that capability
were not there.
Analogous to COMPILING.
Both of functions and of definitions.
Make sure it's clear that caching of defns. is not in any
way a special feature: any facet can be cached.
Options involved:
The basic idea is that a request comes in with
some description of the cost/benefit curve
(for each amount of resources expended and each
degree of precision of the result, specify how
acceptable that resource consumption / accuracy of
solution pair is.) One extreme of this is to provide
an ionclad limit on resources (Boborow's resource-
limited comoutation), and say "Do the best you can
within these time/space/... limits." Another extreme
(most common in present-day programs) is to specify
an ironclad goal criterian ("An answer which is at
least this good, no matter how long it takes to get.")
We are making the reader aware of the space in between
the two extremes.
Now that we went up to lofty heights, come back down.
In Eurisko, we linearized this space. We picked a
few telling parameters of it, and said that a descrip
was merely a list of values for these parameters.
Time, space, user communic, degree of precis.
Go through a couple examples in detail.
Contrast with psychological conjectures of cognitive economy
(e.g., Collins&Quillian, KRL, ...). More like $HR↑2$ Plasticity
model of storing all retrieved paths as direct links
[I can gues what this means, but you (Rick?) should do this
section.]
General principles
Note that this is in general a heuristic rule driven decision-
making process, and what we have specified below are just
some of the relevant heuristics. Many of the more general,
comonsense ones are absent, and even the ones present are
not specified all the way down to the operational (LISP) level
If a progam had these rules, the categories below would be
the various kind of subslots allowable (differentiable).
Note that there must exist heuristics for determining the
applicability of the following, as a function of machine
architecture. Equivlaently: we haven't fully worked out
the left hand sides (domains of relevance) of the rules.
Updating Principles
-------------------
When
If the curr. request's description specifies that a more
precise answer is needed than the one cached.
Only if the resource alloc. is high enough to get a better
or more current answer than the one cached already.
Both of these mean: If you've gotten a better value to cache.
Of ourse, if there is no cached answer available, update=write
Typically: the cached value was F(x), and x changed,
or the definition of F changed.
Why
Frame problem: the value is old, and the world (may've) chang
A new value is computed (either intentionlly or accidentally)
and its current-ness makes it "better" perhaps.
How
get demon traps that flag the cache as out of date
the user requests updating if the cache seems staleness
Usually: discard the whole old value, store just the new one.
If the program is very smart: When the world changes, try to
propogate that change through the network of caches,
changing THEM. This is much better than erasing them!
Give an example of this in action (prob. fictitious).
Where
In what form
What
When not to
This should be merged with "When to"
How to
How is this different from "How", above?
Storage Principles
------------------
When
Every time you have to call a lower order function to eval. it
& it took quite a while.
You've caled it before, recently & the value didn't change.
If the amount of processing to get it was very high.
Esp: after a lucky accident, where a repeat might
be very costly or even fail entirely to duplicate
your recent success at getting a solution.
If (freq of usage)x(avg cost to recompute)/(freq of change)
is high, then it pays off.
Why
Cost of recomputing vs. cost of storage.
Context of subsequent cals is similar enough (e.g.l, the same
arguments will come up again.
Related to "When"; namely: in those situations, it's definitel
cost-effective to store a cache.
Analogy to hardware utility.
How
Called functions might suggest how to cache their value in higher
calling caches (e.g., my value changes often so cache my defn.).
Cache should be transparent & discardable (should be able to throw
them all away if space needed).
Make sure to store a description of the CALL which led to
this value being computed. I.e., make sure you
store the amount of resources brought to bear, the
time (to tell freshness later on) of day, whether
any other caches were used in this search, etc.
If possible, predict (criteria, time-limit, etc.) under
which this will no longer be current (eg, depends
on defn of Y, depends on there being no exs. of Z...)
Where
The value to be cached should hav a precise storage place.
One extreme is an assertional database, with NO structure.
WE PREFER TO ADD slot and subslot structuring, with some
domain-specific tinkering of those two sets (esp the former).
Thus an extreme example of prime numbers should be stored
on a slot called extreme-examples, on a schema called primes.
At access time, we look in the precise spot(s) where X would
be, and if it isn't there, we compute and cache it right there
In what form
value ) what level of abstraction (partially evaluated
expression) symbolic expression)
Stack previous values to enable you to tell if they're changing.
What
You store a flag saying you've been here before.
When it was computed.
How much effort was expended on it, down to what levels of
algorithms, with what around caches incorporated.
Certainty of the result.
When not to
The value changes too frequently.
The value is so critical that the cost-benefit criteria
(eg the resource limits are enormous) always favor
recomputation rather than chancing a blunder.
The function evaluates as fast as the caching mechanism itself
Space is too tight
In general, these are all related to When (above).
How to eliminate caches
Space tight--> eliminate last used caches (last referenced)
If freq. of use is negligible
Just reverse the formula in "When" (above), to see if it
is no longer cost-effective to keep the cache.
Remeber that a value will occupy varying amts. of space.
AM, eg, eliminated large cached values first.
It's also a good place, here, to present the progression of caching:
This shows how "heuristics" are caches of experience, too.
ACCESS==> CALCULATE ==> DEDUCE ==> INDUCE BY OPEN=ENDED SEARCH
\ / \ / \ /
cache thm, alg heur rule
[eg, a theorem or algorithm is one way to simply a deduction, to reduce
it to mere calculation. Heuristics are capable of reducing discovery tasks
to mere symbolic manipultion, to mere deductive computation (as in AM); and
caches are capbable of educing calculations to mere lookups]
MAYBE make some sense of this and put it in:
Save redundant genl/spec links; Save values of virtual slots; Save multiple
instantiations of the same rule at different levels of abstraction;
record consensations of many experiences as new heuristic rules.
[tradeoffs: size of KB; updating problems; precision in reasoning one's
way through delicate inference chains effiiently; synthetic reasoning;
rule interaction]
Mention the psych. studies that show that people do develop direct association
between words that are co-retrieved often. Thus dog and animal are closer
than dog and mammal, even tho dog-->mammal--->animal in the hierarchy.
\SSEC{EXPECTATION-SIMPLIFIED PROCESSING: INTELLIGENT FOCUS OF ATTENTION}
Central notion: reserve your computing for opportunities to realize
potential for expanding knowledge
Ie: be willing and able to add new facts,
but be eager to add surprising new facts.
Equivalently: by being guided by expectations (scripts, frames
you will find yourself zipping through 90% of what
comes in from he world, and spending most of your
processing time on the remaining puzzles, anomalies.
You may decide how much to expend re-confirming the expected
You may decide how much time to spend deciding how much time..
Reductions realizable through expectations:
Perceptual set: see what you expect with less effort
Surprise: heighten sensitivity to data inconsistent with
expectations
Perhaps mention analogy to frog's visual system.
Predict and prepare
This includes a whole continuum, growing more
flexible, more descriptive (less scalar), dynamic:
> Store a value, by binding a variable to it, so you can
later refer to it (retrieve it) by name.
> Cache the value of a function-call, so if you ever want
to find F(x), first you look to see if that value is
already computed and stored away, else you compute it.
Note that this is much like hashing, like associative
memory: the idea that you have a place where F(x)
should be (the hash of the string PARENTS(FRED), eg)
and you look there first. This is just a genl. of
the previous ">" about binding a variable to a value.
> After finding F(x), store the information that would
enable you to recompute it rapidly. Hopefully, this
will also help you compute F(y) (either for all y
or at least for some y's similar to x), and possibly
will even help you compute G(x), where G is similar
to F (and, much more rarely, G(y)).
The idea here is to cache an algorithm, or at least
constraints on a search, for finding F(x).
One very special case of this is normal compiling:
the compiled code for F is just a more efficient way
of finding values of F. As with our above remarks, one
often finds his code worked interpreted but not
compiled. This is considered "wrong" (ie a bug in the
compiler), but is much more in line with our general
idea of mere assistance, which is sometimes quite
useful but may sometimes not help you at all.
> Synthesize and store away a partial mode of the world.
This may include a (sbu)network of frames, which
together capture enough about the situation that
in the future its relevance will be recognized, and
some aspects of this situation will help guide future
processing on some problem (which is recognized as
similar). This is much like analogy, like real
"learning" (storing away experiences).
What mechanisms are implicated?
Caching
This should encompass also normal Compiling of subroutines.
PDMs (as triggers or demons)
The analogy: you hear someone yell "Fire!" in a theater.
Also: mention the basis for much of humor: the punchline.
The intentional misleading initially (wrong LHS)
An unexpected reaction (wrong RHS)
This may even provide a start at a taxonomy of
humor, one which coud be operationalized.
Relevance to learning
Confirm or disconfirm predictors
This requires setting up PDMs to fire on dis/confirmation
Yes, the idea is that when an expectation is not met, we
also (have some chance of) modifying the culprit
rule(s) that made that false prediction (and/or
adding new rule(s) which would make the right one.
THis is v. similar to Mycin's task, and the TEIR.
approach is worth mentioning.
This ties learning in very closely to scientific theory
formation (at least the theories of knowledge which
are activist, revolutionary conventionalist activist,
methodological falsificationist, sophisticated.
[those are the nodes descending in a genl/spec hier
of theories of knowledge]
\NSECP{COGNITIVE ECONOMY REVISITED}
[let's leave this section until last, till the others are written out]
Sample problem: using a world model (simulator) to answer questions
(e.g., what'd happen if 100 bombers went in there?)
Representation of this knowledge as PDMs at difft levels of abstn
Ability to generalize and cache results at one level at the
next higher level,
e.g. either as numerical tables, stat. distns, or
symbolic expressions
Ability to answer some questions appropriately for the requestor
at a high level of abstraction
KB Design
One good reason to use inheritanc is to speed knowledge
implementation, not computing performance
Using the system should result in its speedup
Storage should be cheap
Machine architecture
PDI should be cheap
PDMs should be scheduled with variable resources and
should be able to spend effort accordingly
How could propagation of changes be made efficient?
\NNSECP{Acknowledgements}
We wish to thank...
\vfill\end